93 research outputs found

    TiMDPpoly: An Improved Method for Solving Time-Dependent MDPs

    Get PDF
    We introduce TMDPpoly, an algorithm designed to solve planning problems with durative actions, under probabilistic uncertainty, in a non-stationary, continuous-time context. Mission planning for autonomous agents such as planetary rovers or unmanned aircrafts often correspond to such time-dependent planning problems. Modeling these problems can be cast through the framework of Time-dependent Markov Decision Processes (TiMDPs). We analyze the TiMDP optimality equations in order to exploit their properties. Then, we focus on the class of piecewise polynomial models in order to approximate TiMDPs, and introduce several algorithmic contributions which lead to the TMDPpoly algorithm for TiMDPs. Finally, our approach is evaluated on an unmanned aircraft mission planning problem and on an adapted version of the well-known Mars rover domain

    Un Algorithme Amélioré d'Itération de la Politique Approchée pour les Processus Décisionnels Semi-Markoviens Généralisés

    Get PDF
    La complexité des problèmes de décision dans l'incertain dépendant du temps provient sou-vent de l'interaction de plusieurs processus concurrents. Les Processus Décisionnels Semi-Markoviens Généralisés (GSMDP) consituent un formalisme efficace et élégant pour représenter à la fois les aspects de concurrence d'événements et d'actions et d'incertitude. Nous proposons un formalisme GSMDP étendu à un temps observable et un espace d'états hybride. Sur cette base, nous introduisons un nouvel algorithme inspiré de l'itération de la politique approchée afin de construire des politiques efficaces. Cet algorithme repose sur une exploration guidée par la simulation et utilise les techniques d'appren-tissage à vecteurs supports. Nous illustrons cet algorithme sur un exemple et en proposons une version améliorée qui compense sa principale faiblesse

    Extending the Bellman equation for MDPs to continuous actions and continuous time in the discounted case

    Get PDF
    Recent work on Markov Decision Processes (MDPs) covers the use of continuous variables and resources, including time. This work is usually done in a framework of bounded resources and finite temporal horizon for which a total reward criterion is often appropriate. However, most of this work considers discrete effects on continuous variables while considering continuous variables often allows for parametric (possibly continuous) quantification of actions effects. On top of that, infinite horizon MDPs often make use of discounted criterions in order to insure convergence and to account for the difference between a reward obtained now and a reward obtained later. In this paper, we build on the standard MDP framework in order to extend it to continuous time and resources and to the corresponding parametric actions. We aim at providing a framework and a sound set of hypothesis under which a classical Bellman equation holds in the discounted case, for parametric continuous actions and hybrid state spaces, including time. We illustrate our approach by applying it to the TMDP representation of Boyan and Littman

    Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm

    Get PDF
    In the context of time-dependent problems of planning under uncertainty, most of the problem's complexity comes from the concurrent interaction of simultaneous processes. Generalized Semi-Markov Decision Processes represent an efficient formalism to capture both concurrency of events and actions and uncertainty. We introduce GSMDP with observable time and hybrid state space and present an new algorithm based on Approximate Policy Iteration to generate efficient policies. This algorithm relies on simulation-based exploration and makes use of SVM regression. We experimentally illustrate the strengths and weaknesses of this algorithm and propose an improved version based on the weaknesses highlighted by the experiments

    Adapting an MDP planner to time-dependency: case study on a UAV coordination problem

    Get PDF
    In order to allow the temporal coordination of two independent communicating agents, one needs to be able to plan in a time-dependent environment. This paper deals with the modeling and solving of such problems through the use of Time-dependent Markov Decision Processes (TiMDPs). We provide an analysis of the TiMDP model and exploit its properties to introduce an improved asynchronous value iteration method. Our approach is evaluated on a UAV temporal coordination problem and on the well-known Mars rover domain

    XMDP : un modèle de planification temporelle dans l'incertain à actions paramétriques

    Get PDF
    Certains problèmes de décision impliquent de choisir à la fois des actions à entreprendre mais également des paramètres à affecter à ces actions. Par exemple, l'action ``avancer'' nécessite souvent d'y associer une distance. Dans le cadre de la décision dans l'incertain, on propose d'étendre le modèle MDP pour prendre en compte des actions paramétriques dont le paramètre est une variable de décision. On s'attache à établir les équations d'optimalité pour ces MDP paramétriques et on prolonge les résultats connus pour les MDP classiques. La variable temporelle a une place spéciale dans ce modèle, on détaillera ses propriétés et on les mettra en lumière des travaux précédents en planification temporelle dans l'incertain et en MDP à espaces d'état hybrides

    Une Approche basée sur la Simulation pour l'Optimisation des Processus Décisionnels Semi-Markoviens Généralisés

    Get PDF
    Time is a crucial variable in planning and often requires special attention since it introduces a specific structure along with additional complexity, especially in the case of decision under uncertainty. In this paper, after reviewing and comparing MDP frameworks designed to deal with temporal problems, we focus on Generalized Semi-Markov Decision Processes (GSMDP) with observable time. We highlight the inherent structure and complexity of these problems and present the differences with classical reinforcement learning problems. Finally, we introduce a new simulation-based reinforcement learning method for solving GSMDP, bringing together results from simulation-based policy iteration, regression techniques and simulation theory. We illustrate our approach on a subway network control example

    A Simulation-based Approach for Solving Temporal Markov Problems

    Get PDF
    Time is a crucial variable in planning and often requires special attention since it introduces a specific structure along with additional complexity, especially in the case of decision under uncertainty. In this paper, after reviewing and comparing MDP frameworks designed to deal with temporal problems, we focus on Generalized Semi-Markov Decision Processes (GSMDP) with observable time. We highlight the inherent structure and complexity of these problems and present the differences with classical reinforcement learning problems. Finally, we introduce a new simulation-based reinforcement learning method for solving GSMDP, bringing together results from simulation-based policy iteration, regression techniques and simulation theory. We illustrate our approach on a subway network control example

    Une approche du traitement du temps dans le cadre MDP: trois méthodes de découpage de la droite temporelle

    Get PDF
    De nombreux problèmes de planification s'inscrivent dans un environnement instationnaire. Dans le cadre de la décision dans l'incertain sur horizon infini, pour les problèmes stationnaires à l'infini, on se propose de définir un cadre de modélisation dérivé du modèle SMDP dans lequel la variable temporelle est observable par l'agent. Dans ce cadre, nous développons trois approches de résolution différentes afin de générer des politiques qui en tout état discret du système spécifient l'action optimale à entreprendre en fonction de la date courante

    Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

    Get PDF
    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems
    • …
    corecore